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Abstract 

A simple explicit construction is provided of a partition- valued fragmentation pro- 
cess whose distribution on partitions of [n] = {1, . . . ,n} at time 9 > is governed 
by the Ewens sampling formula with parameter 9. These partition-valued processes 
are exchangeable and consistent, as n varies. They can be derived by uniform sam- 
pling from a corresponding mass fragmentation process defined by cutting a unit 
interval at the points of a Poisson process with intensity 9x~ 1 dx on K + , arranged 
to be intensifying as 9 increases. 



1 Introduction 

There has been much recent interest in models for random processes of fragmentation 
and coagulation: see Chapter 5 of [T3] and the recent book [3]. Mekjian and others 
m EH El E2] have considered Ewens partitions with parameter 9 as a model for frag- 
mentation phenomena, with the intuitive notion that increasing 9 corresponds to further 
fragmentation. But it does not seem obvious how to construct a nice Markovian fragmen- 
tation process corresponding to this idea. 

It was pointed out in [T3] that it is possible to construct a sequence of partition- valued 
processes (H n ,0 , 9 > 0) (n — 1, 2, . . .) with the following properties: 

• (Ewens distribution) H nj $ is for each n — 1,2,... and 9 > a random partition of 
the set [n] := {1,2, . . . ,n}, with distribution determined by the following formula: 
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for each composition (n 1; . . . , n^) of n = Yli=i n i > anc ^ eacn partition tc of [n] into 
blocks of sizes ni, . . . , 7i&, 
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k 



P(n n , fl = tt) 



P + 1], 



n-l 



(i) 



8=1 



where [x] m := x(x + 1) ■ • • (x + m — 1) is the Pochhammer factorial; 

• (fragmentation) H n £ is a refinement of n n> ^ if 9 > 0, for all n > 1, that is each 
block of il n is some union of blocks of H n! $; 

• (consistency) for each n < m a process with the same distribution as {U. n ,e , 9 > 0) 
is obtained by restriction of (Jl m ,e , > 0) to [n]; 

• (exchangeability) for each n the law of (U n .e , 9 > 0) is invariant under permutations 
of the set [n]. 

We call a sequence of partition-valued processes (U nt e , 9 > 0) (n = 1,2, . . .) with these 
properties a family of Ewens fragmentations. One family of Ewens fragmentations asso- 
ciated with Kingman's coalescent was analysed in 

Some general theory [3] implies that any such family of processes can be defined in the 
strong sense consistently (i.e. so that 11^0 |[ n i = H n ,e for m > n) on a single probability 
space by means of uniform sampling of points engaged in a process of fragmentation of 
a total mass 1 into a countable collection of sub masses with sum 1, with more and more 
refined splitting of the submasses as the time parameter 9 increases. See Chapter 3] 
and 1 -\ . Chapter 5] for further background and references. 

The problem was posed in j^, of characterising the dynamics of a family of Ewens 
fragmentations, preferably in a Markovian way. For applications, it is desirable to have 
a model which can easily be simulated for modest values of n. But previous efforts fall 
short in this respect. In this note we partly solve this problem by constructing a new 
family of Ewens fragmentations. Our family is not Markovian, but it enjoys the Markov 
property and follows a very simple transition rule when viewed as a fragmentation process 
in the extended space of ordered partitions. This simplification by passing to an ordered 
structure extends our previous work on regenerative partitions and their relatives [3 |H1 E] • 

2 Construction 

Recall that a composition of n is a sequence of positive integers (m, . . . , Uk) with sum n. 
We regard a composition of n as a way of distributing n unlabelled balls in an ordered 
sequence of k non-empty boxes, with balls in the zth box. A composition of n is also 
conveniently encoded by the binary sequence of 0's and l's obtained by concatenating 
subsequences of the form 1, 10, 100, . . ., where the zth subsequence in the concatenation 
has length n». So the symbols 1 occur at places 1, n\ + 1, n\ + rii + 1, . . . Yli=i n i + 1- 
Using a particular composition (3,4,1) of 8 for illustration, the balls-in-boxes picture 
is suggested by the notation [000] [0000] [0]. The binary representation is obtained by 
replacing each [0 by a 1 and deleting each ] to obtain the sequence 10010001. Let x and y 
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be two compositions of n, each represented as a binary sequence, say x = (xi, 1 < i < n) 
and y = (yi,l < i < n). Say x is a refinement of y if Xi > y\ for every i. In terms of 
the balls-in-boxes picture, x is derived from y by splitting boxes into sub-boxes, and in 
terms of the binary representation x is derived by switching some O's to l's. For instance, 
11010101 is a refinement of 10010001. 

Given a stochastic process (C n< e , > 0), with values in compositions of n we define an 
associated partition-valued process (H n ,0, 9 > 0) by first assigning each of the n places for 
a ball in the balls-in-boxes representation a number in [n] according to a uniform random 
permutation of [n] independent of (C n ^ , 9 > 0), thus obtaining an ordered partition of [n], 
then ignoring the order of the boxes to obtain a partition of [n]. If (C n £ , 9 > 0) is refining 
as 9 increases then the associated partition- valued process (Jl n ,e , 9 > 0) is a fragmentation 
process whose law is invariant under permutations of [n] . The process (U n ,e , 9 > 0) then 
describes a process of randomly splitting up a collection of balls labelled by [n] into an 
unordered collection of boxes. 

Theorem 1. Let Qj for j = 1,2, ... be a sequence of independent random variables 
with distributions 

P(6 i <9) = 9/(9 + j - 1), 9 > (2) 

(where 0/0 = 1). Let C n ,g for n = 1, 2, . . . 6e t/ie random composition of n whose binary 
representation is the sequence of indicator variables 1(0 j < 9) for 1 < j < n. Then the 
sequence (Tl nt e , 9 > 0) of partition-valued process associated with (C nt g , 9 > 0) defines a 
family of Ewens fragmentations. 

That n n;6 ) has the Ewens distribution (JTJ) can be read from the known result [TJ El H3] 
that in the binary representation of the composition of n derived from the block sizes of a 
Ewens partition of [n] in reversed size-biased order, the digits are independent Bernoulli 
variables with parameters 9/ (9+j — l) as in (0). The device (j2J) with independent variables 
0j is then the simplest way to make these indicators simultaneously for all j and n to be 
increasing in 9, which is all that is needed to make (U nt g , 9 > 0) a Ewens fragmentation. 
What is much less obvious is the consistency of these processes for various n. To put 
this another way, if in the process of splitting a set of m balls according to the indicators 
l(Oj < 9) for 1 < j < m we pass to the balls-in-boxes picture and just observe the 
splitting process restricted to a uniformly chosen random subset of n < m balls, this sub- 
process is identical in distribution to the process of splitting of the first n balls using the 
indicators l(Oj < 9) for 1 < j < n. This sampling consistency property of compositions, 
which is so intuitive in the balls-in-boxes picture, is quite painful to express entirely in the 
binary encoding. We circumvent that difficulty by deriving consistency from the Poisson 
representation of the corresponding mass fragmentation model, which is introduced in the 
next section. See also for a more extensive discussion of such consistency properties of 
partition structures derived by random sampling from self-similar random sets, like the 
self-similar Poisson process in the next section or the zero set of a Brownian motion. 
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3 Poisson representation of the mass fragmentation 



Let (Tj, Vi) be a listing of the points of a homogeneous Poisson point process on the positive 
quadrant with rate 1 per unit area. For each fixed 9 > 0, the random countable set 

Z e := {Ti : < % < 1 and V t < 9/Ti} 

is then the set of points of a Poisson process on [0, 1] with intensity 9t~ 1 dt. To orient the 
reader, we start by recalling some well known properties of Zq and the induced random 
composition [TJ |UJ EH US] • 

(i) Let Yifi > >2,6» > • • • be the points of Zq in decreasing order. Then 

y,<> = II 
t=i 

where the Wi t e (i = 1,2,.. .) are independent and identically distributed random 
variables with beta(#, 1) distribution. 

(ii) If Pjfi is the length of the jth component interval of [0, 1]\Zq, working from right 
to left, that is Pj t $ = Yj-i,e — Yj,e where Y j := 1, then 

j'-i 

P h e = {l-W h e)\{W h e 
1=1 

for Wifi i.i.d. beta(#, 1) as before. The distribution of this random discrete proba- 
bility distribution (Pj t e , j > 1) is known as the GEM(#) distribution. 

(iii) The distribution of the decreasing rearrangement of (Pj,e , j > 1) is the Poisson- 
Dirichlet distribution with parameter 9. 

(iv) Let Ux, U2, . . . , be a sequence of independent uniform [0, 1] variables, independent 
of Zq, and define a random partition 11^^ of the set N of positive integers to be 
the collection of equivalence classes for the random equivalence relation: i ~ e j 
if and only if either i = j or both C/j and Uj fall in the same component interval 
of [0, l]\Zg. Then Hoofi is an exchangeable random partition of the infinite set N, 
whose restriction U. n> g to [n] is a Ewens partition governed by (0) for each n. 

(v) Let U n> j be the jth smallest value among U\, . . . , U n , and let X n j(9) be the indicator 
of the event that U n j is the least value among those of the n values which fall in 
some component interval of [0, l]\Zg. Then for each fixed n and 9, the X n j(9) for 
j = 1,2, ... ,n are independent, with 

F(X n>j (9) = l) = 9/(9 + j-l). (3) 

More precisely, the sequence (X n j(9), 1 < j < n) is the binary encoding of a random 
composition C n ^ of n which is a particular ordering of the sizes of blocks of Il n <9 . If 
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the blocks of Ii n ,e are, say, Bi, . . . , B^, then the sizes of these blocks appear in the 
composition in increasing order of values of minj g £ fc Ui 1 . 

An immediate consequence of the above construction of Zq from a Poisson process in 
the positive quadrant is that the random set Zq increases as 9 increases. Consequently, the 
various quantities introduced above describe a process of fragmentation of [0, 1] into subin- 
tervals. In particular, the partition valued process (H n ^ , 9 > 0) is refining as 9 increases, 
and each of the processes (X n j(9) , 9 > 0) is increasing as 9 increases. Consistency and 
exchangeability of the processes (H n ,e , 9 > 0) are obvious from (iv). 

The proof of the consistency property claimed in Theorem ^ is completed by the follow- 
ing lemma, which shows that in this Poisson setup the indicator variables (X n j(9), 1 < j < 
n) in the binary expansion of the composition C n ^ associated with the natural ordering 
of blocks of n n <9 (as in (v)) can be derived from independent variables (O n j , 1 < j < n) 
with the same distribution as (Qj , 1 < j < n) in Theorem ^ 

Lemma 2. Let Q n j :— inf{9 : X n j(9) = 1}, so that X n j(9) may be represented as 

X n>j (9) = l(9 nj - < 9). 

Then for each fixed n the nj - are independent random variables with 

F(X n;j (9) = 1) = P(6 nj <9) = 9/(9 + j - 1) , (j = 1, . . . , n). 

Proof. Fix 9j > for j — 2, . . . , n. Observe that the event (Q n ,j > Qj) occurs if and only 
if there is no point (T i; V*) of the Poisson process with Tj G [U n j^i, U n j] and Vi < 9/Ti . 
Therefore 



P (n" =2 (6 nj > Bj) I U n>j , 1 < j < n) = exp ( - ^ / -f dt ) = J] 

i=2 




U n ,j-i 



and hence 



n 

J=2 



U n ,j-i 



11 7-1 



because the ratios U n j_i/U n> j are independent with beta(j — 1, 1) distributions, 2 < j < n. 
□ 



As before, let (C nt g , 9 > 0) be the process of refining compositions of n, defined either 
through indicators as in Theorem Q or by means of the Poisson construction as in (v) 
above. Immediately from the definition, we have: 

Corollary 3. (C n ,0> 9 > 0) is a Markov process whose inhomogeneous transition rates 
are determined by the rule: if at time 9 the state is the composition of n encoded by some 
binary sequence starting with 1, each is switching to 1 at rate 1/(9 + j — 1), where j is 
the place of this in the sequence, while all other transition rates are trivial. 

1 Strictly speaking, this defines C„ g in terms of U^s and Zg, rather than through H n ,e- Conditionally 
given H n ^ = {B\, . . . , Bk] the composition has the same distribution as the sequence #Bj arranged by 
decrease of minimal elements minBj. Conditionally given the induced partition {JfBj , j < k} of integer 
n, this arrangement is the inverse size-biased ordering of the block sizes. 
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Proof. Indeed, 



P(6 j e[0, + d9]\ Qj > 



WJ~i) dd d9 



3-1 



9+j-l 

□ 



If we view Tloo^ together with the total ordering of the blocks, as induced by the 
natural order of intervals (recall (iv)), we obtain an exchangeable ordered partition U*^ e 
of N. Let IT* Q be its restriction to [n\. The law of II* g is given by the ordered version of 
the Ewens sampling formula 



[9 + l] n _! £± m H h rij 

for every it* ordered partition of [n] with block sizes (jii, . . . , rik). The process (H n ,9 , > 
0) is Markovian for every n. The transition mechanism of (II* , 9 > 0) is determined by 
that of {C n fi , 9 > 0) and the following allocation rule 2 : each time a block Bj of size a 
splits in two fragments of sizes £ and r/, all Q) possible allocations of the elements of Bj 
among the offspring fragments are equally likely. 



4 Further properties 

In principle, the finite dimensional distributions of (Jln,e , 9 > 0) may be determined by 
some summations of probabilities determined by the Markov process {C n fi,9 > 0). But 
such formulas appear to be of limited value. It appears that the process (Jl n ,e, 9 > 0) is 
not Markovian. 

Proof that for n > 2 the fragmentation process is not Markovian Consider the 
random time n of the first split, that is 

6„ = min e nj - = inf{6> : IL^ ^ {["]}}, 

2<j<n 

where Il nj o = {[ n ]} is the initial partition with a single block. To show that the Markov 
property of the fragmentation process (n n e, 9 > 0) does not hold for every n > 2 we shall 
focus on the conditional probability 

Q(t) := P(n n , e = A I U n ^ = A , U n , t = A) = P(n n>e = A I U n ^ = A , e n < t), 

where 0, 9 are considered as parameters, < t < <p < 9, and A is the partition of [n] in 
two blocks {1} and {2, . . . ,n}. To disprove the Markov property it is sufficient to show 
that Q(t) is not constant as t varies. 

Note that Il n ^ = A is only possible when the composition C n ^ assumes either the 
value 1100 ... or the value 100 ... 01, and conditionally given either of these values n n 

2 Which is common for all exchangeable fragmentation processes. 
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equals A with probability 1/n (as a consequence of exchangeability). Working out this 
dichotomy, 



p(n n , = a, e n < t) = 

p(e nj2 < t, e n>n > ; e„ i3 > 0, . . . , © n , n _i > 0) - + 

n 

P(6n,2 > 0, @n,n < * 5 ©n,3 > 0, ■ ■ ■ , ©n,n-l > 0) ~ = 

n 



t + 2-l\ + n-iy y/ n \ <f> + 2-lJ t + n-1 v/ n 



(t + l)(0 + n-l) (0+ + 1)7 VYV n 
where for shorthand -P(0) := P(©n,3 > 0, • ■ ■ , ©n,n-i > 0)- Noting the inclusion 

{n n , e = a, e n < t} c {n n ^ = a, e n < *} 

and applying the above formula to the event on the left-hand side, we compute 

= P(n n , g = A , 8 W < t) = ( t(n6 + 2n-2) + (n- lf{9 + 1) + 9 + n - 1 \ P(0) 
1 ' P(n n ,^ = A , § n < t) \t(n<t> + In - 2) + (n - 1) 2 (0 + 1) + + n - lj P(0) ' 

This does not depend on t if and only if 

n6 + 2n - 2 _ (n - 1) 2 (0 + l) + + n- l 
n0 + 2n - 2 ~ (n- 1) 2 (0+ 1) + + n - 1 ' 

or, equivalently, if and only if the polynomial 

{nO + 2n - 2)((n - 1) 2 (0 + 1) + (p + n - 1) = 
n(n 2 - 2n + 2)00 + n 2 (n - 1)6 + 2(n - l)(n 2 - 2n + 2)0 + 2n(n - l) 2 

is symmetric in and 0. To maintain symmetry we must have 

n 2 {n - 1) = 2(n - l){n 2 - 2n + 2), 

which forces positive n to be either 1 or 2. Thus for n > 2 the partition-valued process is 
not Markovian (while it is trivially Markovian for n = 1 or 2). 

Transition rates of the Ewens fragmentation Given the value tt of U n} e , the compo- 
sition C n fi can be recovered by arranging the sequence of block sizes of 7r in the reversed 
size-biased order. This property taken together with Corollary |3] allows to compute the 
transition rates. To illustrate the method, suppose that at time 9 the partition Tl n Q is in 
state 7r with block sizes {a,b}, and let a be some nontrivial refinement of tc with block 
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sizes {£, T), b}, so (,+rj = a. Suppose first that a ^ b, £ ^ r). Then C n $ = (a, b) with prob- 
ability bj[a + 6), and C nj e = {b, a) with probability a/(a + 6). Inspecting all possibilities 
we see that the fragmentation process jumps from tt to o at rate 

1 \ /a\ 1 a f 1 1 \ ( a 

+ 



a + b\9 + £ + riJ\iJ a + b\9 + £ + b 9 + rj + b J \f 

where the binomial coefficients accounts for the number of ways of allocating the elements 
of the splitting block among two new fragments. A minute thought shows that the formula 
is still valid in the case a = b; but if £ = r\ the above expression should be halved. 

In principle, there exists a Markovian family of Ewens fragmentations, with the same 
transition rates as that of (n ni e)'s. However, this seems to be of little use, because the 
formulas for these rates become increasingly complicated when the number of blocks 
grows. 

Comparison with the Ewens fragmentation derived from Kingman's coalescent 

Another family of Ewens fragmentations was derived in j2j from Kingman's coalescent tree. 
These fragmentations are not Markovian in the proper sense, and no extended Markov 
property for them is known. We show next that the partition-valued process in [2] is 
different from the process constructed in this paper. 

As before, consider the time of the first split n . Let I n + 1 be the almost surely 
unique index j which makes O n = Q n j- Then the split at time G n creates a partition 
II n Q n with two blocks of sizes I n and n — I n . Conditioning on Q n gives 

^^fie^n/^*) (4) 

which simplifies to 

P(/ n = i) = (n-l)! / 
Jo 



+ i)[9 + l] 



(5) 



n-l 



In particular, for n = 4 this gives 



4-l) = 6^-log2 + ilog3 + i 

4 = 2) = 6Qlog3-i 

P(J 4 = 3) = 6(tog2-^log3-^ 

Compare the second of these evaluations with the corresponding formula in Section 7.1] 
to see that this Ewens fragmentation process evolves differently to the Ewens fragmenta- 
tion derived from Kingman's coalescent, which has a different distribution on partitions 
of [4] with two blocks at the time of the first split. Neither of these distributions is that 
of II4 given that this partition has exactly two blocks, even though this conditional 
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distribution does not depend on 6. This common conditional distribution is the Gibbs 
distribution on partitions of [4] into 2 blocks which assigns probability proportional to 
(rii — l)!(ri2 — 1)! to each partition of [4] into two blocks of sizes n\ and n 2 . 

It is still an open question for which n there exists a discrete time fragmentation 
process on partitions of [n] whose distribution at time k is the distribution on partitions 
of [n] into k blocks which assigns each such partition into blocks of sizes {rii, . . . , n^} a 
probability proportional to n!=i( n i ~~ !)'• 
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